Periodic fluctuations in reading times reflect multi-word-chunking

Memory is fleeting. To avoid information loss, humans need to recode verbal stimuli into chunks of limited duration, each containing multiple words. Chunk duration may also be limited neurally by the wavelength of periodic brain activity, so-called neural oscillations. While both cognitive and neural constraints predict some degree of behavioral regularity in processing, this remains to be shown. Our analysis of self-paced reading data from 181 participants reveals periodic patterns at a frequency of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sim$$\end{document}∼ 2 Hz. We defined multi-word chunks by using a computational formalization based on dependency annotations and part-of-speech tags. Potential chunk outputs were first generated from the computational formalization and the final chunk outputs were selected based on normalized pointwise mutual information. We show that behavioral periodicity is time-aligned to multi-word chunks, suggesting that the multi-word chunks generated from local dependency clusters may minimize memory demands. This is the first evidence that sentence processing behavior is periodic, consistent with a role of both memory constraints and endogenous electrophysiological rhythms in the formation of chunks during language comprehension.

In the current study, we report for the first time that self-paced reading (SPR) data indeed also contain periodic patterns at a frequency that is consistent with periodic neuronal processes previously associated with chunking.Specifically, we show that these patterns align with chunk boundaries defined by our computational formalization.There are fruitful approaches for defining multi-word chunks through computational formalization.Different methods adopted for word tagging in an information extraction system might generate inconsistent output chunks 23,24 .Here we define multi-word chunks independently by using a computational formalism based on dependency annotations [25][26][27] , combining with the classic approach of a word-tagging system with the tagset of bio, where b means the beginning of a chunk, i means inside a chunk, and o means outside of a chunk 28 .This approach yields chunks that for the most part align with major syntactic boundaries-as exemplified in Fig. 1 (see Chunking Algorithm for details).These linguistically grounded chunk boundaries are established by finding the optimal set of sub-trees in a dependency tree.Often sentences can be chunked in more than one way, so we use an information theory process based on dependency relations and part-of-speech tags to prioritize more likely chunk candidates.Fundamentally, this means that the more often part-of-speech tags are connected to one another via specific dependency relations in a corpus, the more likely they will form a chunk in a given sentence.
Our findings provide the first behavioral evidence that reading behavior is regular at a slow time scale, consistent with both memory constraints on multi-word chunking and an involvement of rhythmic electrophysiological processes in the generation of multi-word chunks.Particularly, this periodic behavior seems to be relevant for the cognitive formation of multi-word units during higher-level language processing and may minimize memory demands.

SPR times are periodic < 4 Hz
We applied frequency-domain time-series analysis to wrap-up effects 29 in N = 181 openly accessible SPR data sets 30 .To highlight wrap-up effects, we differenced the raw SPR time series, effectively amplifying transitions from slowdown to speedup across subsequent words (see Fig. 8B).For statistical analysis, we took a permutation approach (see Data Analysis).This revealed a peak around 2 Hz (see Fig. 2).Specifically, at 1.75, 2, and 2.25 Hz, the t-value of the one-sample t-test on the observed power estimates exceeded the 950th entry of the sorted distribution of t-values from tests on 1000 PSD spectra resulting from permutations of the differenced data, corresponding to an uncorrected one-tailed p < 0.05 .After Bonferroni-correction for the 100 query frequen- cies, this remained significant ( p < 0.001 , corrected) at 2 Hz.These results suggest that natural, unconstrained reading slows down and then speeds up at a period of 0.5 s.

Periodicity relates to chunking
To obtain chunks, we employ a computational model that defines them as sequences of words and bound morphemes that allow for all local dependencies to be established 25 (for an example, see Fig. 7).The recognition of a chunk boundary has been shown psychologically real through different experimental paradigms (e.g. a click paradigm that has participants listen to sentences with clicks and indicate where the clicks are [31][32][33] ) and analysis techniques (e.g. a hierarchical clustering scheme that data can be grouped by the measures of relatedness and then map them onto the hierarchical structure 34,35 ).The approach we adopted here is analogous to formal linguistic definitions 26 and resonates with classical phrase-structural approaches to chunking 27 .Note that the locality of these chunks implicitly minimizes memory demands, which are widely viewed as a key constraint on dependency processing 36 .
To link periodic slowdown-speedup transitions to chunking, we first detected positive peaks in the differenced time series.These differences mark major transitions from slow to fast reading times.We then performed mixedeffects logistic regression analyses to assess whether the occurrence of a turning point depended on the presence of a chunk boundary.To stay consistent with prior literature on wrap-up effects, our boundary factor included not only a level for chunk boundaries, but also a level sentence for sentence boundaries; for comparison, a level non-boundary marked words that did not occur at either type of boundary.We first fitted a baseline model including an intercept, fixed effects of word frequency and word form surprisal, and random effects of subject and story.The baseline model was then compared to a model adding the boundary factor.Inclusion significantly improved model fit above baseline ( χ 2 (2) = 588.82,p < 0.001 ).Analogous comparisons for subsets revealed significant model improvement within all condition pairs (sentence and chunk: χ 2 (1) = 371.30,p < 001 ; sen- tence and non-boundary: χ 2 (1) = 597.01,p < 0.001 ; chunk and non-boundary: χ 2 (1) = 19.18,p < 0.001 ; Fig. 3).This means that the transitions from slow to fast reading times occurred more often at sentence boundaries relative to both chunk boundaries and non-boundary words, and more often for chunk boundaries relative to non-boundary words.

SPR slows down within chunks
To substantiate the relevance of slowdown-speedups for chunking, we further assessed the progression of reading times within chunks, with the hypothesis that reading times increase gradually as readers approach the end of a chunk.To this end, within each chunk, we fitted a linear model that predicted reading time from word position.We then extracted the slope for each chunk and entered all slopes as a dependent measure into a new model with an intercept only, plus random factors for subject and story.There was a significant positive effect of the model intercept ( t(11.98) = 4.34, p < 0.001 , Satterthwaite-approximated Degrees of Freedom; Fig. 4).This suggests that reading times increase across word positions within chunk.

Discussion
Our analyses provide the first behavioral evidence that higher-level language comprehension-specifically, the formation of multi-word memory chunks-is a periodic behavior.Previous work in cognitive neuroscience has linked slow periodic neural activity to eye movements during reading 21 and the formation of multi-word chunks during language comprehension 11,13,14 .The current results suggest that this is indeed behaviorally relevant for language processing.Readers slow down and then speed up roughly every 0.5 s-mostly at sentence and chunk boundaries.They also show a gradual increase in SPR times from chunk onset to offset, which may indicate the incremental integration of words into a multi-word unit that progressively increases in size, consistent with neurophysiological evidence for a gradual increase of electrophysiological activity towards the end of each multi-word unit within a sentence 37 .Periodicity may thus reflect chunking.Classical wrap-up effects could reflect the periodic formation of multi-word chunks 29 .
The tendency of SPR transitions to occur at the boundaries of sentences and chunks links rather well to prior psycholinguistic proposals.In particular, it has been argued that memory constraints limit the distance of dependencies between the words and bound morphemes of sentences 36,38 .Here, we provide a complimentary hypothesis inspired by electrophysiology: If the wavelength of neural oscillations limits the duration of chunks, it would implicitly enforce short dependency distances to allow for dependency formation within the current memory chunk 1,27 .Nevertheless, the current data cannot dissociate this syntactic approach from perceptual notions of chunking.Likely, many of the chunk boundaries as defined here align with implicit prosodic boundaries.In spoken language, there is a strong alignment between syntactic and prosodic boundaries [39][40][41] .In the absence of prosodic markings, both listeners and readers generate implicit prosodic structure to guide perceptual sampling 42 .Moreover, implicit prosody is also reflected in periodic brain activity at delta-band frequency 43 .We embrace the classical view that perceptual sampling in time windows that cover multiple words and the formation of fine-grained dependency structure amongst these words go hand in hand 7 .Future research needs to investigate how such a staged architecture maps onto periodic neural and cognitive processes.
The current findings could provide an initial hint at a possible relationship between periodic slowdowns in reading and the periodicity of the electrophysiology of chunking.M/EEG studies have argued that delta-band oscillations (< 4 Hz) reflect the grouping of words into larger units 11,13,14 .Consistent with this, we observe a spectral peak at 2 Hz in reading times.Strikingly, the SPR data analyzed here do not contain any physical rhythm or boundaries.As chunk boundaries are not marked visually, they must be set by some cognitive heuristic 31,44 .Yet, given that the current study did not assess concurrent M/EEG in addition to behavioral responses, it remains to be shown that behavioral periodicity indeed stems from endogenous neuronal rhythms that synchronize with higher-level linguistic information [45][46][47] .
The current chunking formalism operates within the framework of dependency grammar 48 , which does not explicitly assume a hierarchical syntactic structure.Different types of cognitive units above the single-word level have been linked to periodic brain activity (for discussion, see [49][50][51] -some hierarchical, some not 11,13,14 ).In principle, from the current results, we may only claim that the size of chunks may relate to the size of a neural processing window in the delta-band.The microstructure of syntax and syntactic processing within chunks is beyond the scope of the current work.

Conclusion
Readers speed up and slow down periodically at a period of 0.5 s.These transitions may indicate the formation of multi-word chunks that allow for establishing all dependencies amongst the words and bound morphemes held in working memory at a time.Multi-word chunking is a periodic behavior, possibly mirroring underlying rhythmic neuronal processes.

Data
We analyzed a set of openly-accessible self-paced reading (SPR) data from 181 native speakers of English 30 .Participants had been instructed to read 10 stories from the Natural Stories Corpus word by word, advancing through button press.The reading time was measured for each word from the presentation onset to the buttonpress.Each story includes roughly 1000 words, which results in 10,245 words and 485 sentences in total.The text was automatically parsed using the Stanford Parser 52 .The output from the parser was manually corrected and automatically converted to the annotations for the Universal Dependencies (UD) by Futrell et al. 30 , so the data has high-quality human-verified UD annotations.

Chunking algorithm
The processes described below are applied to the human-verified UD annotations of the Natural Stories Corpus.The chunks require no generalization as they, and the statistics used to derive them, are drawn directly from the We define chunks as sequences of words and bound morphemes that form saturated local dependency clusters [25][26][27] .The chunking algorithm employs dependency annotations and part-of-speech tags 48 .Specifically, chunks are considered base-level subtrees, allowing for a language-agnostic definition and annotation.This means the core algorithm is based on subtrees with a depth of 1.However, this restriction is softened to allow for chunks with a depth of 2 to minimize unitary chunks using a simple heuristic as described below.
As a first step, potential candidate chunks are extracted.For a given sentence and its corresponding tree, the span between each node n at position x and its corresponding head h at position k (where k can be greater than or less than x) is considered a candidate chunk if the nodes between position x and k − 1 (if k > x ) or between k + 1 and x (if k < x ) all have the same head h.This process results in potentially overlapping chunks (e.g., the head of one chunk could be a dependent in another).To select the optimal chunk annotation for a given tree, each chunk is scored based on normalized pointwise mutual information (NPMI) 53 .We use the NPMI between the Universal part-of-speech (UPOS) tag of a node (t) and the tuple of the UPOS of the head of that node (ht) and the relation between the node and its head (rel).Such that for a given node: and the average NPMI of a potential chunk is: where N is the number of nodes in a phrase and d is a dependent in a phrase C. The potential chunks in a given tree are then selected greedily.That is the potential chunks are ordered based on their NPMI and the highest is selected first resulting in any conflicting chunk annotations being removed.This is repeated until no potential chunk labels are left.
This process results in a large number of unitary chunks (i.e., chunks with only one node) which is unlikely to echo the multi-word units of natural language.In order to rectify this, two simple heuristics were applied.The first removes superfluous punctuation (superfluous with respect to the syntactic tree).Punctuation is only removed if a node has a UPOS tag of PUNCT and has no dependents.An example is shown in Fig. 5.
The second heuristic attaches floating unitary nodes to chunks.This in effect removes the single depth restrictions of chunks, which was only introduced to simplify the original engineering use of this method.If a unitary www.nature.com/scientificreports/chunk occurs at the boundary of a multi-token chunk and is syntactically linked to any element in that chunk (i.e., is the head or dependent of a node in the chunk), it is included in that chunk and the annotation is updated.Punctuation is treated slightly different.Any punctuation nodes that remain after applying the first heuristic is considered part of a chunk if it satisfies the boundary condition (with the syntactic criterion ignored), as the punctuation does not impact the analysis.An example is shown Fig. 6.Then the derived chunks are viewed as components, which can be a word or a sequence of words that takes into account inter-word relationships such as precedence and dominance.The overall process of generating chunk outputs is summarized in Fig. 7.

Data analysis
Preprocessing and spectral analysis were performed in MatLab ® (The MathWorks, Inc., US); statistical analysis was performed in R 54 .The authors of the original corpus suggested trimming reading times outside a range of 100-3000 ms 30 .Because the complete removal of word reading times would have disrupted the spectral analysis, and thereby, the actual pace of reading, we kept the original latency of each button press relative to story onset and data values outside the range of 100-3000 ms were replaced with a median value.The median was calculated within subject and story.Imputation affected 6 % of data values.Reading times were then log-transformed to achieve a normal distribution (Fig. 8A).For highlighting chunking-related slowdown-speedup transitions in the data, we performed differencing on the imputed vector of reading times (Fig. 8B).This decision was based on prior evidence for reading-time slowdowns at the end of clauses and sentences 55,56 (for review, see 29 ) and independent evidence from visual chunking in non-human primates that observed changes in reaction times at chunk boundaries 57 .In the differenced vector, local maxima reflect transitions from slowdowns to speedups between adjacent words (Fig. 8C).After differencing, data were converted to a time series sampled at 1000 Hz.
The original latency of each button press in milliseconds relative to story onset served as index, the log-transformed reading time served as value.
The time series within subject and story then underwent short-term Fourier transform using Welsh's power spectral density (PSD) estimation (window length = 4168 samples, overlap = 2084 sample, frequency resolution = 0.1 Hz); PSD was converted to power (see Fig. 2 for results).Statistical analysis employed a permutation approach: First, observed spectra were averaged within subject across stories and a one-sample t-test was performed across subjects within frequency bin; the t-statistic was noted.Second, a distribution of estimates for comparison was generated: Within story and subject, 1000 random time series were generated by randomly permuting the differenced values and inserting them at the observed indices.Spectra were averaged within permutation run and subject across stories; within run, a one-sample t-test was performed across subjects within frequency bin.Third, within frequency bin, we sorted the test statistics from the permuted data and assessed whether the observed statistic would surpass the 950 th value, corresponding to one-tailed p < 0.05 58 , and then Bonferroni-corrected for the 100 query frequencies.
To relate slowdown-speedup transitions to chunking, we performed mixed-effects logistic regression analyses using the lme4 package 59 in R. Words at sentence boundaries were defined as words followed by a period, question/exclamation mark, comma, or (semi)colon.Words at chunk boundaries were defined by the chunker.This means that sentence boundaries and chunk boundaries were mutually exclusive.Non-boundary words were all remaining words (Fig. 8D).Baseline and improvement models were compared using Analysis of Variance.Word frequency was determined with the wordfreq module in Python; word form surprisal was calculated using the minicons module in Python, based on GPT2 60 .Frequency and surprisal were included as nuisance regressors in all models because of their well-known influence on processing effort (for review, see 61,62 ).Before inclusion, frequency and surprisal were scaled and centered.As a second strategy for relating slowdown-speedup transition to chunking, we assessed the progression of reading times within chunks.We took a two-level approach: First, within chunk, we linearly regressed reading time on word position.Second, regression slopes (i.e., β coefficients within chunk) were entered as dependent measure into a linear mixed model, fixed effect being only an intercept, random effects being subject, story, and chunk length (i.e., number of words within chunk); note that a random-slope model failed to converge.Then candidate chunks are found based on the dependencies of each of the tokens, such that each candidate chunk has only one level of depth and is continuous.The candidate chunks are highlighted in R3.There is often more than one way to apply candidate chunks for a given sentence, as is the case in this example.The optimal combination is then selected based on the average NPMI of each candidate chunk which is calculated using the dependency relation types and part-of-speech tags of each token in a given chunk and on the corresponding statistics from the treebank.The candidate chunks are sorted by NPMI and then selected greedily with other candidate chunks being removed from the list if they conflict with the chosen chunk.In the example here, the candidate chunks would be sorted as: C 4 (0.47), C 3 (0.31), C 2 (0.23), and C 1 (0.22) with their NPMI values in parenthesis.So C 4 would be selected first, followed by C 3 .Because C 2 is no longer viable as it overlaps with C 3 , it is removed from the list of candidate chunks, leaving just C 1 which is then selected.The the full stop (period) is removed from the process as it has no dependencies (see Fig. 5).R4 highlights the annotated chunks resulting from this process with R5 showing the corresponding chunk labels.

Figure 1 .Figure 2 .
Figure 1.Formalization of chunks: chunks (green boxes, bottom) are automatically extracted based on the dependency relations and part-of-speech tags of a given sentence.Sentence examples are from UD English EWT treebank.

Figure 3 .
Figure 3.Percentage of relative maxima of differenced vector of log-transformed reading times (mean ± standard error) at sentence endings (purple), chunk endings (green), and non-boundary words (yellow); asterisks mark statistical significance at p < 0.001 in paired-samples t-tests.

( 1 )Figure 5 .
Figure5.The top tree shows the original tree with two punctuation tokens.The nodes corresponding to tokens '?' and '-' (highlighted in magenta) are removed as they are tagged as PUNCT and neither have any dependents.The corresponding edges (also in magenta) are also cut from the tree.The resulting tree is shown below.

Figure 6 .
Figure 6.Labelling of chunks: Beginning of the chunk; Inside the chunk; Outside the chunk.The unitary chunk formed of the token shows (highlighted in magenta in the top tree) is labelled O. Possible chunks to attach it to are tried in order of their NPMI.This results in the unitary chunk being appended to the chunk consisting of My and schedule.The resulting chunk forms My schedule shows.

Figure 7 .
Figure7.Applying the chunker algorithm to a UD dependency tree.R1 shows the tokens of the sentence with the corresponding dependency tree above it.R2 shows the corresponding part-of-speech tags for each token.Then candidate chunks are found based on the dependencies of each of the tokens, such that each candidate chunk has only one level of depth and is continuous.The candidate chunks are highlighted in R3.There is often more than one way to apply candidate chunks for a given sentence, as is the case in this example.The optimal combination is then selected based on the average NPMI of each candidate chunk which is calculated using the dependency relation types and part-of-speech tags of each token in a given chunk and on the corresponding statistics from the treebank.The candidate chunks are sorted by NPMI and then selected greedily with other candidate chunks being removed from the list if they conflict with the chosen chunk.In the example here, the candidate chunks would be sorted as: C 4 (0.47), C 3 (0.31), C 2 (0.23), and C 1 (0.22) with their NPMI values in parenthesis.So C 4 would be selected first, followed by C 3 .Because C 2 is no longer viable as it overlaps with C 3 , it is removed from the list of candidate chunks, leaving just C 1 which is then selected.The the full stop (period) is removed from the process as it has no dependencies (see Fig.5).R4 highlights the annotated chunks resulting from this process with R5 showing the corresponding chunk labels.

1 ∆Figure 8 .
Figure 8. Data processing steps: (A) time series of log-transformed raw reading times from roughly 22 s of data from an example story and subject; each value is plotted at its corresponding latency relative to story onset; (B) differenced time series; (C) differenced time series with local maxima marked (red); (D) latencies of words at sentence boundaries (S; purple), chunk boundaries (C; green), and non-boundary words (N; yellow).
UD annotations.The algorithm fundamentally gives a solution to finding base-level subtrees when more than one solution exists.